Current Issue : January - March Volume : 2014 Issue Number : 1 Articles : 4 Articles
This paper proposes a novel and robust voice activity detection (VAD) algorithm utilizing long-term spectral flatness\r\nmeasure (LSFM) which is capable of working at 10 dB and lower signal-to-noise ratios(SNRs). This new LSFM-based\r\nVAD improves speech detection robustness in various noisy environments by employing a low-variance spectrum\r\nestimate and an adaptive threshold. The discriminative power of the new LSFM feature is shown by conducting an\r\nanalysis of the speech/non-speech LSFM distributions. The proposed algorithm was evaluated under 12 types of\r\nnoises (11 from NOISEX-92 and speech-shaped noise) and five types of SNR in core TIMIT test corpus. Comparisons\r\nwith three modern standardized algorithms (ETSI adaptive multi-rate (AMR) options AMR1 and AMR2 and ITU-T G.729)\r\ndemonstrate that our proposed LSFM-based VAD scheme achieved the best average accuracy rate. A long-termsignal\r\nvariability (LTSV)-based VAD scheme is also compared with our proposed method. The results show that our\r\nproposed algorithm outperforms the LTSV-based VAD scheme for most of the noises considered including difficult\r\nnoises like machine gun noise and speech babble noise....
Devices such as smart phones and tablet PCs of various sizes have become increasingly popular, finding new\r\napplications, including in-car audio systems. This paper proposes a new car audio system. In the architecture, music\r\ndata is stored in an online database, which users are then able to select a genre of music or a playlist from through\r\na 2D interface. Self-organizing map, depending on a personalized distance function and music contents, is utilized\r\nto map music tracks to the interface. With this data model and interface, drivers can easily select the type of music\r\nthey want to listen to without interfering with their driving. Artificial neural networks record and analyze user\r\npreference, allowing the system to select and order the music tracks to be played automatically. Experiments have\r\nshown that the system satisfies user requirements....
Many features have been proposed for speech-based emotion recognition, and a majority of them are frame based or\r\nstatistics estimated from frame-based features. Temporal information is typically modelled on a per utterance basis, with\r\neither functionals of frame-based features or a suitable back-end. This paper investigates an approach that combines\r\nboth, with the use of temporal contours of parameters extracted from a three-component model of speech production\r\nas features in an automatic emotion recognition system using a hidden Markov model (HMM)-based back-end.\r\nConsequently, the proposed system models information on a segment-by-segment scale is larger than a frame-based\r\nscale but smaller than utterance level modelling. Specifically, linear approximations to temporal contours of formant\r\nfrequencies, glottal parameters and pitch are used to model short-term temporal information over individual segments\r\nof voiced speech. This is followed by the use of HMMs to model longer-term temporal information contained in\r\nsequences of voiced segments. Listening tests were conducted to validate the use of linear approximations in this\r\ncontext. Automatic emotion classification experiments were carried out on the Linguistic Data Consortium emotional\r\nprosody speech and transcripts corpus and the FAU Aibo corpus to validate the proposed approach....
Query-by-Example Spoken Term Detection (QbE STD) aims at retrieving data from a speech data repository given an\r\nacoustic query containing the term of interest as input. Nowadays, it has been receiving much interest due to the high\r\nvolume of information stored in audio or audiovisual format. QbE STD differs from automatic speech recognition (ASR)\r\nand keyword spotting (KWS)/spoken term detection (STD) since ASR is interested in all the terms/words that appear in\r\nthe speech signal and KWS/STD relies on a textual transcription of the search term to retrieve the speech data. This\r\npaper presents the systems submitted to the ALBAYZIN 2012 QbE STD evaluation held as a part of ALBAYZIN 2012\r\nevaluation campaign within the context of the IberSPEECH 2012 Conferencea. The evaluation consists of retrieving\r\nthe speech files that contain the input queries, indicating their start and end timestamps within the appropriate\r\nspeech file. Evaluation is conducted on a Spanish spontaneous speech database containing a set of talks from MAVIR\r\nworkshopsb, which amount at about 7 h of speech in total. We present the database metric systems submitted along\r\nwith all results and some discussion. Four different research groups took part in the evaluation. Evaluation results\r\nshow the difficulty of this task and the limited performance indicates there is still a lot of room for improvement. The\r\nbest result is achieved by a dynamic time warping-based search over Gaussian posteriorgrams/posterior phoneme\r\nprobabilities. This paper also compares the systems aiming at establishing the best technique dealing with that\r\ndifficult task and looking for defining promising directions for this relatively novel task....
Loading....